Multilingual corpora for speech-to-speech translation research
نویسندگان
چکیده
Multilingual spoken language corpora are indispensable for developing new speech-to-speech machine translation (S2SMT) technologies. This paper first discusses characteristics that corpora for S2SMT should have, then surveys existing corpora. Finally, it compares these corpora.
منابع مشابه
The Development of the Multilingual LUNA Corpus for Spoken Language System Porting
The development of annotated corpora is a critical process in the development of speech applications for multiple target languages. While the technology to develop a monolingual speech application has reached satisfactory results (in terms of performance and effort), porting an existing application from a source language to a target language is still a very expensive task. In this paper we addr...
متن کاملLinguistic representation of Finnish in a limited domain speech-to-speech translation system
This paper describes the development of Finnish linguistic resources for use in MedSLT, an Open Source medical domain speech-to-speech translation system. The paper describes the collection of the medical sub-domain corpora for Finnish, the creation of the Finnish generation grammar by adapting the original English grammar, the composition of the domain specific Finnish lexicon and the definiti...
متن کاملProceedings of Meetings on Acoustics
India possesses a large variety of languages and dialects spoken in different parts of the country. These languages possess some unique linguistic, phonological and phonetic properties different from European languages. Research is being done in several of Indian languages such as Hindi, Bangla, etc. to study the articulatory, acoustic, Phonetic and prosodic nature for the purpose of creating s...
متن کاملStatistical speech-to-speech translation with multilingual speech recognition and bilingual-chunk parsing
Initiated mainly from speech community, researches in speech to speech (S2S) translation have made steady progress in the past decade. Many approaches to S2S translation have been proposed continually. Among of them, corpus-dependent statistical strategies have been widely studied during recent years. In corpus-based translation methodology, rather than taking the corpus just as reference templ...
متن کاملImproving Translation Lexicon Induction from Monolingual Corpora via Dependency Contexts and Part-of-Speech Equivalences
This paper presents novel improvements to the induction of translation lexicons from monolingual corpora using multilingual dependency parses. We introduce a dependency-based context model that incorporates long-range dependencies, variable context sizes, and reordering. It provides a 16% relative improvement over the baseline approach that uses a fixed context window of adjacent words. Its Top...
متن کامل